GAME THEORETIC APPROACH TO POST-DOCKED 

SATELLITE CONTROL 


Takashi Hiramatsu * and Norman G. Fitz-Coy t 

University of Florida, Gainesville, FL 32611 


Abstract 

This paper studies the interaction between two satellites after docking. In order to maintain the docked state with 
uncertainty in the motion of the target vehicle, a game theoretic controller with Stackelberg strategy to minimize 
the interaction between the satellites is considered. The small perturbation approximation leads to LQ differential 
game scheme, which is validated to address the docking interactions between a service vehicle and a target vehicle. 
The open-loop solution are compared with Nash strategy, and it is shown that less control efforts are obtained with 
Stackelberg strategy. 


1 Background/Introduction 

Autonomous on-orbit robots services have been studied recently with several architectures, such as Orbital Express 
HI (refueling), SUMO El, and HEROS |3] (repair). These tasks include rendezvous and docking between a resident 
space object (i.e., a disabled satellite) and the service vehicle, as shown in Fig. |T] During these operations, it is 
possible that the disabled satellite is non-cooperative. This could be due either to a loss in communication and/or 
control actuation. There are three possible non-cooperative scenarios between the target and the service vehicle, and 
they can be considered in the framework of game theory: (i) the interaction is completely non-cooperative such that 
each satellite acts independent of each other, (ii) they have an adversarial interaction, where one tries to evade the 
other, while the other pursues, and (iii) it is partially non-cooperative where one’s action is non-cooperative (disabled 
satellite) while the other is cooperative (service vehicle). 

In game theory, the interactions mentioned above are classified as (i) Nash, (ii) Minimax, and (iii) Stackelberg 
strategy 0. In a Nash strategy, each player optimizes its objective without consideration of the other players; however, 
it could result in the equilibrium solution which is not optimal to each individual player (e.g.. Prisoner’s Dilemma). 
In a Minimax strategy, by assuming the worst case and trying to minimize the damage, each player can obtain the 
solution which is the safest. In a Stackelberg strategy, the leader can enforce their action and the resultant equilibrium 
solution is always favorable to the leader. The Stackelberg leader always obtains the result which is better than or 
equal to the result with a Nash strategy. 

The situation considered in this study is as follows: the service vehicle (SV), after caught or docked with the 
target (RSO: resident space object), tries to maintain its state without causing damages due to the interaction between 
them. While the SV has a specific goal that it should track the motion of the RSO, the RSO has an arbitrary motion. 
One way is to treat the RSO as the Stackelberg leader and the SV as the follower. Another way is to treat the SV 
as the leader since the SV may be able to estimate the motion of the RSO by prior observation. How well the 
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Figure 1: An image of two satellites docking. 

docked state is maintained can be determined by investigating the interaction between them since ideally there is 
zero force and torque when two satellites are docked to form a completely rigid body, and any collision or detach 
will generate the force and torque between them. In order to study the interaction between the SV and the RSO, a 
control problem based on the two-person linear quadratic differential game is designed and analyzed to minimize the 
interactive forces between the docked satellites assuming non-cooperative behavior. The small variation in the vicinity 
of the nominal operation condition is considered in order to model the system as a LQ (linear state equation with 
quadratic cost) differential game. When simplified dynamics is allowed, LQ differential game approach works well, 
providing analytical solutions. The Nash and the Stackelberg solutions are compared with classical LQR controller, 
which assumes cooperation between the RSO and the SV (i.e., the RSO is not disabled). For simplicity only the 
open-loop strategies are considered. 

2 Method of Approach 

The docking-maintenance problem is formulated as a two-person differential game. First two satellites moving in 
coplanar circular orbits are modeled as rigid bodies with the RSO as the leader and the SV as the follower. In the 
numerical simulation the case with reversed roles (the SV as leader and the RSO as the follower) is also studied. The 
interaction between them is modeled as a spring and a damper. Thus, a change in distance and velocity between them 
produces the interaction forces, which are to be minimized. Each body is controlled by a force input and a torque input. 
It is further assumed that the thrusts and torques are decoupled. A linear state-space representation of the system is 

x (t) = A% (t) + M . x «i (f) + Ji 2 M 2 (t) , ^(0)=£ o (1) 
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The SV and the RSO minimize their cost functionals: 


Ji = ^x T (t f )K if x(t f ) 

+ \J o | ^ W Or ^ + X] ® =a~i ^ | dt (2) 

where i =SV, RSO and the final time tf may be finite (finite horizon) or infinite (infinite horizon), and in the latter 
case, K | , = K^ , = 0. Open-loop linear-feedback control inputs are 

Ui = ( 3 ) 

—2 = ( 4 ) 

and therefore the states are found by solving 

£= (i-Unir^r -S^gKe)*, *(0) = *o (5) 

Equations ({3} - (0 hold for both Nash and Stackelberg strategies while the gain matrices K_ and are different. 

2.1 Nash strategy 

In a game where each player plays Nash strategy, they try to minimize their own cost simultaneously. For a two-person 
LQ differential game model like the one given by Eqs. (0 - 0, the optimality conditions are derived directly through 
calculus of variations. Starr El has studied the Nash solution of a two-person open-loop nonzero-sum linear quadratic 
differential game. Assuming full-state feedback, the game defined by the Eqs. ([TJ - 0 have unique open-loop Nash 
strategies where K_ and K_ 2 are obtained by solving the set of coupled Riccati differential equations 


K_ 1 = 

-a t k 

— K A 

- Q 

+ K^r-^b^k^ 

+ R A R 2 R^' 2 l£ 2 K 2 

(6) 
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-A T K n 

-KA 
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+ kbr~'b t k 

+ K 2 b i R~Ib^ = a i 

(7) 
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where K_ i (tf) = _A ( ( and K_ 2 (tf) = for finitie-horizon case. For infinite-horizon, the corresponding algebraic 
Riccati equations (Eqs. 0-0 with K_ 1 = K_ 2 = 0, K_^, = K ^ , and K^ ,. = AT ) are solved. The algorithms by 
Engwerda 0 are used to solve these Riccati equations. 

2.2 Stackelberg strategy 

In a two-person Stackelberg game, one player is assigned as the leader and the other as the follower. The leader has 
advantages such as having faster computational power or having more information than the follower, and therefore has 
power to enforce their action. Thus, the follower’s optimal strategy is found by solving an optimal control problem 
treating the leader’s strategy as the prescribed function. Once the solution to this tracking problem is obtained, the fol- 
lower’s control is substituted into the leader’s cost. Finally, the leader solves the optimal control problem knowing that 
the follower will ’’follow” the leader’s decision. One of the earliest complete solution methodologies for a Stackelberg 
strategy in a two-person differential game was developed by Simaan [|7). Following 0 with assumption of full-state 
feedback, the unique open-loop Stackelberg solution with the SV as leader, which is exactly of the same form as Eqs. 
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©-©, is obtained by solving the coupled Riccati-type differential equations 


( 8 ) 

(9) 


It = -4^1 - 4t4 - a + ^tlJuttt + 


R_2 = -A K 2 - K 2 A - Q 2 


£i= 




K B R~ 1 B t K 
= 2 = 2=22 =2 =2 


l = AE~RA + mMuMfK t + PMa&gga 






(10) 


where (fy) = If , iL 2 (i/) = i^ 2 /’ anc * —(*-*) = - f° r finite-horizon. Similarly to the Nash strategy, for infinite- 
horizon Eqs. ® - m are solved with K ^ , = K^. K_ 2 = K_ 2 , P(0) = 0, and K_ 1 = K_ 2 = P_ = 0. These 
Riccati-type equations are solved based on the work by Freiling [(8). 


3 Mathematical Modeling and Problem Formulation 

The following post-capture dynamics is considered. Two satellites previously shown in Fig. |T|are modeled as two 
uniform rigid bodies, the target or the resident space object (RSO) and the service vehicle (SV), as shown in Fig. [2] 
The interaction between the S V and the RSO is modeled by a spring and damper connecting them at the point P and Q 
such that a change in distance or velocity between P and Q produces the interaction forces, which are not necessarily 
radial and therefore cause torques, too. In order to maintain docking, those forces and torques need to be minimized. 
Each body is controlled by a force input and a torque input. It is further assumed that the thrusts and torques are 
decoupled. 


Xsv 


Service Vehicle (SV) 



Figure 2: Two rigid bodies on circular orbits. 


The translational motion of each body moving in a circular orbit is obtained from the two-body equation given by 
ID as 

— (id 


Li + 


L i = —{F i + u i }, i = SV, RSO 


IrJI 3 1 mt 


where r, denotes the position vector located in the center of mass of each body, f\ ( denotes the external force due to 
the spring and the damper, and is the control force input. The rotational motion is governed by Euler’s equation 


LAi + yii x = ALi + Zi-> * = SV, RSO (12) 

where / is the inertia matrix, w, is the angular velocity, M , is the moment due to the spring and the damper, and r , is 
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the control torque input. The position vector of each body is 


(13) 

(14) 


L SV = Rsvo + V 

LrSO = UrSOo + C 


where the nominal radii of the orbit of the S V and the RSO, coordinatized in their respective frames, are 

1 T 


sv 


Us Vo - 


RSO 


UrsOo 


0 0 -Rsv 0 


0 0 -RrsOo 


i T 


and the linear perturbation is coordinatized in the S V’s frame as 


sv r,= 
SV C = 


m V2 m 


Ci C2 C3 


n T 


(15) 

(16) 


(17) 

(18) 


then Eqs. (fTTh - ( IT2l > are linearized to yield the linear dynamic model (Eq. ([]])). With the assumption of small 
perturbation and that two satellites are on the same orbit (i.e., R_sv 0 — Hrso 0 ) since they are docked and have small 
body compared to the orbit, the coordinate frames are approximated to conincide. The relative distance and attitude 
errors are defined as 


n T 


Sr = 

Sx Sy Sz 

-C-v 

(19) 



T 


69 = 

SOi 662 663 

~ (5 — a 

(20) 


and the state vector x of the system is composed of the relative distance and attitude error and their rates such that 


x = 


Sr T S9 T Sr T S(f 


( 21 ) 


In the following numerical analysis, the states are coordinatized in the body axis of the SV since the focus of this study 
is to control the SV. 


4 Simulation and Results 


The simulation for a infinite-horizon case was run with the system parameters shown in Table Q]with the corresponding 
state-space system 


where 



A = 

Sexe 

L=21 

= 6x6 

4J 



- 1.852 

- 1.852 

— 1.852 

- 0.001 

- 1.482 

0.001 

- 0.926 

- 0.926 

— 0.926 

- 0.001 

- 0.741 

0.001 

- 0.926 

- 0.926 

- 0.926 

- 0.001 

- 0.741 

0.001 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 
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( 22 ) 


( 23 ) 
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and 


i 22 


-0.185 -0.275 -0.185 0 0 0 

-0.188 -0.278 -0.185 0 0 0 

-0.185 -0.185 -0.185 0 0 0 

0 0 0 0 0.001 0 

0 0 0 - 0.001 0 0 

0 0 0 0 0 0 




diag {-0.01, -0.01, -0.01, -0.0167, -0.0167, -0.0167} 


(24) 


(25) 


Sa = 


0 


:6x6 


diag {0.067, 0.067, 0.067, 0.01, 0.01, 0.01} 


(26) 


The units represented in matrices A and B_. are linear distance in meters, mass in kg, and angular displacement 
in radians. The initial conditions are: 


x(0) 


5r T 60 t Sif S9 T ] 

1 J o 

0.2 0.2 0.2 0.05 0.05 0.05 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 


(27) 


The cost functionals are chosen as 


£ = diag | 


% = -—I 


1 1 1 0.1 0.1 0.1 0.5 0 0 0 0 0 


£ll 

= diag | 

R 

= 1„ „ 

=12 

=6x6 

R 

= -R„ 

=21 

=n 

=22 

(M 

II 


0.1 0.1 111} 


(28) 

(29) 

(30) 

(31) 

(32) 

(33) 


Table 1: The system parameters used in the preliminary simulation. 


c 

= 10 Ns/m 

RsVo 

k 

= 20 N/m 

RrSO( 

lo 

= 0.3 m 

Lsv 

msv 

= 150 kg 

Lrso 

mRSO 

= 200 kg 

P-SV 



—RSO 


= 6600.00 km 

= 6600.00 km 

= diag{60, 60, 60} kg-m 2 
= diag{100, 100, 100} kg-m 2 

= [1 1 l] T m 

= [1.5 1.5 1.5 ] T m 


For the sake of comparison, the problem is also solved with the linear quadratic regulator (LQR) controller as- 
suming the interaction between the S V and the RSO cooperative. If the objectives of both the S V and the RSO are to 
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minimize the interaction, a corresponding LQR problem can be constructed as 


x = Ax + B_u 


(34) 


and 


J = 




x + u 1 Ru\ 


dt 


(35) 


where B_ = B_ 1 B_ 2 ,tt= u[ u T, ,Q = Q—Q , and i? =diag — f? 92 j. Note that in order to be able 

to solve the LQR problem, R must be invertible and symmetric. For simplicity, the off-diagnal terms are chosen to be 


zero instead of and . The resultant trajectories are plotted in Fig. [3] and the control force and torque inputs 
are compared in Figs. El-El 






(a) LQR 


lb) Nash 






(c) RSO as leader 


(d) SV as leader 


Figure 3: The resultant trajectory. 

The resultant trajectory doesn’t have notable difference whichever player plays Stackelberg leader or Nash strategy, 
or LQR. The control efforts, on the other hand, changes significantly. It is obvious that with LQR the control efforts are 
the smallest because it is based on the ideal situation that RSO is not disabled. In cases where we can’t communicate 
with the RSO and don’t know its motion, such a cooperation is impossible, and thus LQR is not applicable. Both the 
SV and the RSO have much smaller control force and torque input by playing Stackelberg leader than Stackelberg 
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sv 


sv 



time (s) 
(a) LQR 

SV 


time (s) 
(b) Nash 

SV 



RSO 



(d) Stackalberg: SV as leader 

Figure 4: The control force inputs. 




follower. Also, it is shown that the control efforts with Stackelberg follower is almost the same as with Nash strategy. 
Note that the trajectories and the SV’s control history represent the potential upperbound, the control inputs of the 
RSO in Figs. [4]- [5]are imaginary actuation to cause motion of the disabled RSO. 

It is shown that the results using game-theoretic controllers are similar to LQR controller, which assumed cooper- 
ation between the SV and the RSO. These preliminary results confirm feasibility of game theoretic controller as well 
as the hypothesis that it is possible to obtain smaller actuation inputs for the leader of a Stackelberg approach while 
obtaining relatively the same trajectory variables. 


5 Conclusion 

It was shown that the interaction between controlled service vehicle and the disabled target can be minimized with the 
LQR and the game theoretic controllers for a post-docked maneuver where the distance and the attitude error are kept 
small enough that linear approximation is valid. Although the LQR controller worked the best, it was based on an ideal 
case where the RSO was not disabled. Since the RSO being disabled is the characteristic of the problem, the LQR 
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(a) LQR 


(b) Nash 





(c) Stackalberg: RSO as leader 


(d) Stackelberg: SV as leader 


Figure 5: The control torque inputs. 


controller is not practical. On the other hand. The game theoretic approach, with the Nash or the Stackelberg strategy, 
still kept the distance and the attitude errors small even with noncooperative behavior of the target RSO. Therefore, 
the feasibility of the game theoretic controllers in this noncooperative scenario was validated. 

It was also shown that the control effort of the service vehicle can be lowered by utilizing Stackelberg strategy as 
the leader; that is, if we are able to observe the motion of the RSO before docking in order to store information to 
make an estimate of how the RSO will behave in the near future, we can utilize Stackelberg game-theoretic controller 
to reduce the interaction with smaller control efforts. 
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